Flexible header alteration in network devices

ABSTRACT

At least a packet header of a packet received by a network device is provided to a programmable header alteration engine that includes a hardware input processor implemented in hardware and a programmable header alteration processor configured to execute computer readable instructions stored in a program memory. The hardware input processor determines whether the packet header is to be provided to a processing path coupled to the programmable header alteration processor or to be diverted to a bypass path that bypasses the programmable header alteration processor, and the packet header is provided to the processing path or to the bypass path based on the determination. The packet header is selectively i) processed by the programmable header alteration processor when the packet header is provided to the processing path and ii) not processed by the programmable header alteration processor when the packet header is provided to the bypass path.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/773,772, entitled “Flexible Header Alteration in Network Devices,”filed on Jan. 27, 2020, which claims the benefit of U.S. ProvisionalPatent Application No. 62/798,240, entitled “Programmable HeaderAlteration,” filed on Jan. 29, 2019. Both of the applications referencedabove are hereby incorporated by reference herein in their entireties.

FIELD OF TECHNOLOGY

The present disclosure relates generally to network devices such asnetwork switches, bridges, routers, etc., and more particularly, toprocessing packets in network devices.

BACKGROUND

Network devices, such as bridges and routers, forward packets through anetwork based on addresses in headers of packets. A network devicetypically includes a plurality of ports coupled to different networklinks. The network device typically receives a packet via one port andprocesses a header of the packet at least to decide via which other portor ports the network device should transmit the packet. The networkdevice then forwards the packet to the determined one or more otherports.

During processing of the packets, network devices often perform headeralterations to modify headers of at least some of the packets prior totransmission of the packets. Conventionally, header alteration isperformed using hardware engines that are able to perform headermodification operations a wire-speed. Such hardware header alterationengines typically implement certain pre-determined header alterationoperations and lack flexibility for supporting various types of headersand versatile header alteration processing sets. While softwareimplementations of header alteration engines that utilize processors toperform header alteration based on programmable instructions are moreflexible, pure software implementations cannot support sufficiently highpacket rates and/or cannot support sufficiently varied header alterationoperations.

SUMMARY

In an embodiment, a method for processing packets in a network deviceincludes: receiving, at a packet processor of the network device, apacket received by the network device from a network link; determining,with the packet processor, at least one egress interface via which thepacket is to be transmitted by the network device; providing at least apacket header of the packet to a programmable header alteration engineof the packet processor, the programmable header alteration engineincluding i) a hardware input processor implemented in hardware and ii)a programmable header alteration processor coupled to a program memory,the programmable header alteration processor being configured to executecomputer readable instructions stored in the program memory to performone or more header alteration operations on received packets;determining, with the hardware input processor of the programmableheader alteration engine, whether the packet header is to be provided toa processing path coupled to the programmable header alterationprocessor or to be diverted to a bypass path that bypasses theprogrammable header alteration processor; providing, with the hardwareinput processor of the programmable header alteration engine, the packetheader to the processing path or to the bypass path based on thedetermination of whether the packet header is to be provided to theprocessing path or to be diverted to the bypass path; selectively i)processing the packet header by the programmable header alterationprocessor when the packet header is provided to the processing path andii) not processing the packet header by the programmable headeralteration processor when the packet header is provided to the bypasspath; and transmitting, with the network device, the packet via the atleast one egress interface of the network device.

In another embodiment, a network device comprises a packet processorconfigured to i) receive a packet from a network link and ii) determineat least one egress interface via which the packet is to be transmittedby the network device, and a programmable header alteration engineincluding i) a hardware input processor implemented in hardware and ii)a programmable header alteration processor coupled to a program memory,the programmable header alteration processor configured to executecomputer readable instructions stored in the program memory to performone or more header alteration operations on received packets. Thehardware input processor is configured to determine whether a packetheader of the packet is to be provided to a processing path coupled tothe programmable header alteration processor or to be diverted to abypass path that bypasses the programmable header alteration processor,and provide the packet header to the processing path or to the bypasspath based on the determination of whether the packet header is to beprovided to the processing path or to be diverted to the bypass path.The programmable header alteration processor is configured toselectively i) process the packet header when the packet header isprovided to the processing path and ii) not process the packet headerwhen the packet header is provided to the bypass path. The packetprocessor is further configured to cause the packet to be transmittedvia the at least one egress interface of the network device.

In yet another embodiment, a method for processing packets in a networkdevice includes: receiving, at a packet processor of the network device,a packet received by the network device from a network link;determining, with the packet processor, at least one egress interfacevia which the packet is to be transmitted by the network device;processing a packet header of the packet with a programmable headeralteration processor coupled to a program memory, the programmableheader alteration processor being configured to execute computerreadable instructions stored in the program memory to perform one ormore header alteration operations on received packets, the processingincluding triggering a hardware checksum accelerator engine to calculatea checksum for a bit string corresponding to at least a portion of thepacket header, wherein triggering the hardware checksum acceleratorengine includes i) partitioning the bit string into a plurality ofsegments of the bit string and ii) transferring the plurality of segmentof the bit string to the hardware checksum accelerator engine;incrementally calculating, with the hardware checksum accelerator, thechecksum at least by incrementally summing the respective segments,among the plurality of segments of the bit string, transferred to thehardware checksum accelerator engine; and transmitting, via the at leastone egress interface of the network device, the packet with a modifiedheader that includes the checksum.

In still another embodiment, a network device comprises a packetprocessor configured to i) receive a packet from a network link, thepacket including a packet header and a payload and ii) determine atleast one egress interface via which the packet is to be transmitted bythe network device, and a programmable header alteration processorcoupled to a program memory, the programmable header alterationprocessor configured to execute computer readable instructions stored inthe program memory to perform one or more header alteration operationson packet headers of received packets, the programmable headeralteration processor being configured to, during processing of a packetheader of a received packet, trigger a hardware checksum acceleratorengine to calculate a checksum for a bit string corresponding to atleast a portion of the packet header. The programmable header alterationprocessor is configured to partition the bit string into a plurality ofsegments of the bit string, and transfer the plurality of segment of thebit string to the hardware checksum accelerator engine. The hardwarechecksum accelerator engine is configured to incrementally calculatingthe checksum at least by incrementally summing the respective segments,among the plurality of segments of the bit string, transferred to thehardware checksum accelerator engine. The packet processor is furtherconfigured cause the packet with a modified header that includes thechecksum to be transmitted via the at least one egress interface of thenetwork device.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example network device that includes aprogrammable header alteration engine, according to an embodiment.

FIG. 2 is a block diagram of an input processing unit of theprogrammable header alteration engine of the network device of FIG. 1,according to an embodiment.

FIG. 3 is a flow diagram of a method for determining whether a packet isto bypass header alteration by the programmable header alteration engineof FIG. 1, according to an embodiment.

FIG. 4A is a diagram illustrating a process performed by theprogrammable header alteration engine of FIG. 1 to extract a headeralteration processor accessible header from a packet header, acceding toan embodiment.

FIG. 4B is a diagram of a process performed to generate a processedpacket header after a header alteration accessible header is processedby the programmable header alteration engine of FIG. 1, according to anembodiment.

FIG. 5 is a block diagram of a header alteration processor of theprogrammable header alteration engine of FIG. 1, according to anembodiment.

FIG. 6 is a block diagram of an example packet processing node includedin the header alteration processor of FIG. 5, according to anembodiment.

FIG. 7 is a block diagram of an example memory map used by a processorof the packet processing node of FIG. 6, according to an embodiment.

FIGS. 8A-B are diagrams illustrating several example accelerator enginetrigger instructions issued by a processor of the packet processing nodeof FIG. 7, according to embodiment.

FIG. 9 is a flow diagram of an example method for processing packets inthe network device of FIG. 1, according to an embodiment.

FIG. 10 is a flow diagram of an example method for processing packets inthe network device of FIG. 1, according to another embodiment

DETAILED DESCRIPTION

In embodiments described below, a network device includes a programmableheader alteration engine that includes a programmable header alterationprocessor coupled to a program memory and configured to implement headeralteration operations by executing computer readable instructions storedin the program memory. Generally, because header alteration is performedusing a programmable header alteration processor executing computerreadable instructions stored in the program memory, the headeralteration engine is flexibly programmable/configurable to performnecessary header alteration operations depending, for example, onparticular packets being processed and/or particular scenarios ofemployment of the network device.

In various embodiments described herein, the programmable headeralteration engine includes various capabilities that allow theprogrammable header alteration engine to support header alteration atwire-speed. For example, the programmable header alteration engineincludes capability to dynamically bypass header alteration for somepackets that may not require header alteration by the programmableheader alteration processor, thereby allowing these packets to quicklygo through the programmable header alteration engine while alsoincreasing processing power available for processing packets thatrequire header alteration by the programmable header alterationprocessor, in an embodiment. To quickly and efficiently process packetheaders that require header alteration by the programmable headeralteration engine, the programmable header alteration engine isconfigured to perform one or more of i) select one or more portions of apacket header, and provide the selected portions, rather than the entirepacket header, to a processor for performing header alterationoperations on the packet header, ii) segment the packet headers (or theselected one or more portions of the packet header) so that the one ormore portions of the packet header can be efficiently transferred to theprocessor, iii) identify specific instruction threads for processing theone or more portions of the packet header, and provide an indication ofthe identified instruction thread to the processor so that execution ofthe instructions can begin quicker (e.g., immediately) upon receipt ofthe packet header by the processor, iv) efficiently distribute thepacket headers among a plurality of packet processing nodes of theprogrammable header alteration processor in a manner that allows packetsthat require processing by relatively slower (e.g., longer) processingthreads to not block packets that require processing by relativelyfaster (e.g., shorter) processing threads, v) to provide packet headersof different packets to different packet processing nodes of theprogrammable header alteration processor in parallel for concurrentprocessing of the packet headers, vi) to efficiently trigger externalaccelerator engines to perform certain processing operations duringprocessing of the packet headers, etc. These and other mechanismsdescribed herein allow the programmable header alteration engine toprovide the flexibility of programmable header alteration while alsoallowing the header alterations to be performed sufficiently quickly,for example to support header alteration at packet wire speed.

FIG. 1 is a simplified block diagram of an example network device 100configured to utilize packet header alteration techniques of the presentdisclosure, according to an embodiment. The network device 100 generallyforwards packets among two or more computer systems, network segments,subnets, etc. For example, the network device 100 is a router, in oneembodiment. It is noted, however, that the network device 100 is notnecessarily limited to a particular protocol layer or to a particularnetworking technology (e.g., Internet Protocol). For instance, in otherembodiments, the network device 100 is suitably a bridge, a switch, avirtual private network (VPN) concentrator, etc.

The network device 100 includes a plurality of network interfaces (e.g.,ports) 102 configured to couple to respective network links. Althoughsix network interfaces 102 are illustrated in FIG. 1, the network device100 includes any suitable number of network interfaces 102 in variousembodiments. The network device 100 also includes a packet processor 104coupled to the network interfaces 102. Packets received by the networkdevice 100 are provided by the packet processor 104 for processing ofthe packets, in an embodiment. In some embodiments, packets are storedin a packet memory (not shown) and data units corresponding to thepackets, rather than the packets themselves, are provided to the packetprocessor 104 for processing of the packets. In some embodiments,portions of the packets, such as packet headers, are provided to thepacket processor 104 in addition to or instead of packet descriptors,while remaining portions of the packets are stored in the packet memory.For ease of explanation, the term “packet” is used herein to refer toany combination of a packet itself, a packet header of the packet, and apacket descriptor corresponding to the packet.

Generally speaking, the packet processor 104 is configured to receiveand process packets received via ingress network interfaces 102, todetermine respective egress network interfaces 102 via which the packetsare to be transmitted, and to cause the packets to be transmitted viathe determined egress network interfaces 102. In an embodiment, thepacket processor 104 includes a forwarding engine 106 and a packetclassifier 108. The forwarding engine 106 is configured to analyzeheader information in packets received via network interfaces 102 todetermine network interfaces 102 via which the packets are to betransmitted (referred to herein as “target ports”). As merely anillustrative example, the forwarding engine 106 is configured to use adestination address in a header of a packet to perform a lookup in aforwarding database (not shown), which stores correspondences betweendestination addresses and network interfaces 102, to determine aparticular network interface 102 via which the packet is to betransmitted. As another illustrative example, the forwarding engine 106is configured to use a VLAN ID in a header of a packet to perform alookup in a forwarding database (not shown) (e.g. the same forwardingdatabase discussed above or a different forwarding database), whichstores correspondences between VLAN IDs and network interfaces 102, todetermine a particular set of target network interfaces 102 for thepacket. The forwarding engine 106 is configured to store an ID of atarget port (or set of multiple target ports) in the packet descriptorcorresponding to the packet, according to an embodiment.

The packet classifier 108 is configured to determine to which packetflow a packet belongs by analyzing at least information in a packetheader of the packet, according to an embodiment. In some embodiments,the packet classifier 108 is configured to determine to which packetflow a packet belongs by additionally or alternatively analyzing otherinformation associated with the packet, for example an identifier of anetwork interface 102 via which the packet was received. A packet flowcorresponds to packets sharing certain shared characteristics. As oneexample, some flows, such as Internet Protocol (IP) transmission controlprotocol (TCP)/user datagram protocol (UDP) flows, are typically definedin the networking industry by a 5-tuple such as {destination IP address,source IP address, L4 Protocol, UDP/TCP destination port, UDP/TCP sourceport}. As another example, a particular packet flow may be defined aspackets with headers having a particular source address and a particulardestination address, in an embodiment. In various embodiments, a packetflow may be defined as packets with headers having particular commoninformation such as one or more of i) a particular source address, ii) aparticular destination address, iii) a particular virtual local areanetwork (VLAN) identifier (ID), iv) a particular priority, v) aparticular packet type, etc. The packet classifier 108 is configured toassign respective flow IDs to at least some packets, where the flow IDsindicate the respective flows to which packets belong, according to anembodiment. The packet classifier 108 is configured to store the flow IDassigned to a packet in the packet descriptor corresponding to thepacket, according to an embodiment.

In some embodiments, the packet classifier 108 (or another unit) isconfigured to additionally determine a type of a packet, for example byanalyzing an initial portion of a packet header of the packet and/or andinformation obtained other than form the packet header, such as, forexample, the identifier of the source port via which the packet wasreceived by the network device 100, the identifier of the target port(or set of multiple target ports) via which the packet is to betransmitted by the network device 100, the flow ID assigned to thepacket, etc. The packet classifier 108 (or the other unit) determines apacket type of a packet based on one or more of a destination of thepacket, a flow to which the packet belongs, an EtherType of the packet,whether the packet is a virtual local area network (VLAN) tagged packet,etc., in various embodiments. The packet classifier 108 (or the otherunit) is configured to assign respective packet type IDs to at leastsome packets, where the packet type IDs indicate the packet types of thepackets, according to an embodiment. The packet classifier 108 (or theother unit) is configured to store the packet type ID assigned to apacket in the packet descriptor corresponding to the packet, accordingto an embodiment.

The packet processor 104 further includes a programmable headeralteration engine 110, in an embodiment. The programmable headeralteration engine 110 is configured to perform header alterationoperations to modify headers of at least some of the packets beingprocessed by the packet processor 104. The header alteration operationsperformed by the programmable header alteration engine 110 include, forexample, setting an explicit congestion notification (ECN) mark in aheader of a packet, setting a channelized flow control indication in aheader of a packet, inserting or updating a timestamp in a header of apacket, adding a forwarding tag to a header of a packet, removing aforwarding tag from a header of a packet, updating a forwarding tag in aheader of a packet, updating a transparent clock correction field in aheader of a packet, changing a next hop address in a header of a packet,adding an encapsulating header, removing an encapsulating header, etc.,in various embodiments.

The programmable header alteration engine 110 includes a programmableheader alteration processor 112. In an embodiment, the programmableheader alteration processor 112 includes a packet processing array (PPA)having a plurality of packet processing nodes 114 coupled to a programmemory 116. Each packet processing node 114 of the programmable headeralteration processor 112 includes a packet header processor (e.g., asmall general purpose central processing unit (CPU)) configured toexecute computer readable instructions stored in the program memory 116to perform header alteration operations, in an embodiment. In anembodiment, the small CPU included in each packet processing node 114 isa CPU available from Tensilica (e.g., Xtensa LX5 CPU) or other suitableCPU (e.g., ARC, RISCV, ARM, MIPS, etc.). In an embodiment, the CPUcomprises a single instruction interface for reading instructions from aprogram memory (e.g., the program memory 116), 7 processing pipe stagesfor implementing the instructions, and a single read/write interface forinteracting with a data memory. In an embodiment, the CPU additionallycomprises one or more logic control units (LCUs) and several (e.g., 8,16, 32, etc.) registers, such as 32-bit registers or registers of othersuitable sizes. In other embodiments, other suitable CPU architecturesare utilized.

In some embodiments, the programmable header alteration engine 110additionally includes or is coupled to a purely hardware headeralteration engine (not shown) configured to perform header alterationoperations in hardware. The purely hardware header alteration engine iscoupled, for example, to the input of the programmable header alterationengine 110 or the output the programmable header alteration engine 110,in various embodiments. In such embodiments, the purely hardware headeralteration engine is hardwired to implement certain header alterationoperations, whereas the programmable header alteration processor 110 isconfigurable to implement other header alteration operations that arenot hardwired into the purely hardware header alteration engine.

The programmable header alteration engine 110 also includes an inputprocessor 118, a unified bypass buffer 120, an output processor 122 anda stream merger 124, in an embodiment. The input processor 118 isimplement in hardware, in an embodiment, and the input processor 118 issometimes referred to herein as a “hardware input processor” 118. In anembodiment, the hardware input processor 118 is implemented using one ormore integrated circuits configured to operate as described herein. Inother embodiments, the input processor 118 is implemented in othersuitable manners. The input processor 118 is configured to performpre-processing of packet headers provided to the programmable headeralteration engine 110, in an embodiment. Pre-processing of a packetheader includes determining a processing thread identifier (thread ID)that indicates the type of processing that is to be performed withrespect to the packet header by the programmable header alterationengine 110, in an embodiment. In an embodiment, the input processor 118determines the thread ID based on a packet flow with which the packet isassociated and/or based on a packet type of the packet. In anembodiment, the input processor 118 is coupled to a configuration memory119 that stores associations between packet flows and/or packet typesand thread IDs, and the input processor 118 is configured to retrieve,header from the configuration memory 119, a thread ID corresponding to apacket based on a packet flow ID associated with the packet and or apacket type ID associated with the packet, in an embodiment.Pre-processing of a packet header also includes making a bypass decisionfor the packet header to determine whether the packet header is to beprovided for processing to the programmable header alteration processor112 or to bypass processing by the programmable header alterationprocessor 112, in an embodiment. In an embodiment, the input processor118 is configured to make the bypass decision based at least in part ona packet flow and/or packet type associated with the packet. In anembodiment, the input processor 118 is configured to make the bypassdecision based on at least in part on the thread ID obtained for thepacket and/or based on other configuration attributes (e.g., statisticalattributes) that the input processor 118 may obtain for the packet fromthe configuration memory 119. In some embodiments, the input processor118 makes the bypass decision additionally or alternatively based on acongestion state of the programmable header alteration processor 112. Anexample method implemented by the input processor 118 to make a bypassdecision, according to an embodiment, is described in more detail belowwith reference to FIG. 3.

In an embodiment, when the input processor 118 determines that a packetis a bypass packet for which processing by the programmable headeralteration processor 112 is to be bypassed, the input processor writesthe packet header to the unified bypass buffer 120. When the inputprocessor 118 determines that a packet is a non-bypass packet that is tobe provided to the programmable header alteration processor 112, theinput processor 118 provides the packet header to the programmableheader alteration processor 112, in an embodiment. In anotherembodiment, the input processor 118 selects one or more portion of apacket header, and provides the selected one or more portion of thepacket header, rather than the entire packet header, to the programmableheader alteration processor 112. In some embodiments, the inputprocessor 118 additionally generates metadata corresponding to thepacket and provides the metadata along with the selected one or moreportions of the packet header to the programmable header alterationprocessor 112. The metadata includes an identifier of a processingthread (e.g., the thread ID) to be used by the programmable headeralteration processor 112 for processing of the packet header, in anembodiment. In some embodiments, the metadata additionally includesother information to be used by the programmable header alterationprocessor 112 for processing of the packet header. In an embodiment, themetadata is provided to the programmable header alteration processor 112before the packet header is provided to the programmable headeralteration processor 112 to allow a processor of a packet processingnode 114 of the programmable header alteration processor 112 toretrieve, from the program memory 116, computer readable instructionscorresponding to the identified processing thread based on theindication included in the before receiving the packet header so theprocessing of the packet header can begin immediately as the packetheader is received by the processor, in an embodiment.

The input processor 118 also writes the entire packet header, or atleast the remaining portions of the packet header that were not providedto the programmable header alteration processor 112, to the unifiedbypass buffer 120, in an embodiment. As will be explained in more detailbelow, after the selected one or more portions of the packet header areprocessed by the programmable header alteration processor 112, theentire packet header, or at least the portions of the packet header thatwere not provided to the programmable header alteration processor 112,are retrieved from the unified bypass buffer 120 so that the packetheader can be reconstructed, in an embodiment.

In an embodiment, the input processor 118 is configured to write bypasspackets in a queue (e.g., a first in first out (FIFO) queue or a linkedlist queue) in the unified bypass buffer 120. For example, in anembodiment, the input processor 118 is configured to maintain a linkedlist of bypass packets stored in the unified bypass buffer 120. Inoperation, when the input processor 118 is to write a bypass packet tothe unified bypass buffer 120, the input processor 118 obtains anidentifier of a buffer (“buffer ID”) from a pool of free buffer IDs 121maintained by the input processor 118, in an embodiment. The free bufferID in the pool of free buffer IDs 121 identify free buffer locations inthe unified bypass buffer 120, in an embodiment. The input processor 118then places the buffer ID at the tail of the linked list, and writes thepacket header to the identified buffer location in the unified bypassbuffer 120, in an embodiment. Because the input processor 118 storesbypass packets in a queue, the programmable header alteration engine 110maintains the order of the bypass packets, and subsequently outputs thebypass packets in the order in which the packets were received by theprogrammable header alteration engine 110.

On the other hand, order of non-bypass packets that are provided to theprogrammable header alteration processor 112 is maintained, asnecessary, by the programmable header alteration processor 112, in anembodiment. Accordingly, the input processor 118 need not maintainqueues of non-bypass packets in the unified bypass buffer 120. In anembodiment, when the input processor 118 is to write a processing flowpacket to the unified bypass buffer 120, the input processor 118 obtainsa buffer ID from the pull of free buffer IDs, includes the buffer ID inthe metadata generated for the packet, and writes the packet to theidentified buffer in the unified bypass buffer 120. Subsequently, whenprocessing of the selected one or more portions of the packet header iscompleted by the programmable header alteration processor 112, thepacket is retrieved from the buffer location indicated by the buffer IDincluded in the metadata corresponding to the packet, in an embodiment.Because bypass packets are stored in a queue in the unified bypassbuffer 120 and the processing flow packets are stored in un-queuedbuffer locations in the unified bypass buffer 120, the unified bypassbuffer 120 acts as a queue buffer (e.g., a FIFO buffer or a linked listbuffer) for bypass packets and as a random access memory (RAM) forpackets that are processed by the programmable header alterationprocessor 112, in an embodiment.

With continued reference to FIG. 1, packet headers that are provided tothe programmable header alteration processor 112 are distributed amongthe processing nodes 114 for processing of the packets. In anembodiment, the programmable header alteration processor 112 implementsa re-cycling mechanism to distribute packet headers among the processingnodes 114 and to subsequently obtain processed packet headers from theprocessing nodes 114 so that order within a packet flow is maintained asthe packets flow through the programmable header alteration processor112. In an embodiment, as will be described in more detail below, there-cycle mechanism allows packet flows that require relatively shorterprocessing times by the programmable header alteration processor 112 tonot be blocked by packet flows requiring relatively longer processingtimes by the programmable header alteration processor 112 whilemaintaining packet order within the packet flows.

The output processor 122 is configured to perform post-processing of thepackets that are processed by the programmable header alterationprocessor 112, in an embodiment. The output processor 122 is implementin hardware, in an embodiment, and the output processor 122 is sometimesreferred to herein as a “hardware output processor” 122. In anembodiment, the hardware output processor 122 is implemented using oneor more integrated circuits configured to operate as described herein.In other embodiments, the output processor 122 is implemented in othersuitable manners. In an embodiment, the output processor 122 isconfigured to perform one or more checks on the processed packets todetect errors that may have resulted from processing of the packets bythe programmable header alteration processor 112, in an embodiment. Theoutput processor 122 is configured to retrieve the corresponding packetfrom the unified bypass buffer 120 and to reconstruct the packet headerby incorporating the processed one or more portions of the packet headerinto the packet header retrieved from the unified bypass buffer 120, inan embodiment.

The stream merger 124 merges packet streams output by the outputprocessor 122 with bypass packet streams retrieved from the unifiedbypass buffer 120, in an embodiment. The programmable header alterationengine 110 provides the merged flow for further processing of thepackets and/or transmission of the packets from the network device 100,in an embodiment. For example, the programmable header alteration engine110 provides the merged flow to an egress queuing and shaping unit 128which queues the packets in egress queues corresponding to egressnetwork interfaces 102 via which the packets are to be transmitted, andsubsequently causes the packets to be transmitted via the correspondingegress network interfaces 102, in an embodiment.

Generally, because the programmable header alteration engine 110 isconfigured to perform header alteration functions using processorsexecuting computer readable instructions, the header alteration 110 isflexibly configurable to perform necessary header alteration operationswith respect to various packets being processed by the network device100, for example, at least compared to systems in which packet headeralteration is implemented entirely in hardware, in at least someembodiments. Further, in various embodiments, the programmable headeralteration engine 110 includes various capabilities described hereinthat allow the programmable header alteration engine 110 to provide thisflexibility while performing the necessary header alterationssufficiently quickly, for example to support packet wire speed, invarious embodiments.

FIG. 2 is a block diagram of an input processor 200 suitable for usewith a programmable header alteration engine of a network device,according to an embodiment. In an embodiment, the input processor 200corresponds to the input processor 118 of the programmable headeralteration engine 110 of the network device 100 of FIG. 1. For exemplarypurposes, the input processor 200 is described below with reference tothe programmable header alteration engine 110 of the network device 100of FIG. 1. However, the input processor 200 is used with headeralteration engines different from the programmable header alterationengine 110 and/or with network devices different from the network device100, in other embodiments.

The input processor 200 includes a thread identifier 204. The threadidentifier 204 is configured to determine a thread ID for a packetheader based at least in part on a flow and/or a type of the packet. Inan embodiment, the thread identifier 204 is configured to access athread ID table 212 stored in a memory 210, and to obtain a thread IDfor the packet from the thread ID table 212. In an embodiment, thethread identifier 204 is configured to identify an entry in the threadID table 212 based on one of, or a combination of two or more of, i) asource port of a packet, ii) a target port of the packet, iii) a flow ofthe packet, iv) a packet type, etc., and to retrieve a thread ID fromthe identified entry in the thread ID table 212.

The input processor 200 also includes a bypass decision unit 206configured to determine whether the packet is to be provided via aprocessing path to the programmable header alteration processor 112 orto be diverted to a bypass path that bypasses the programmable headeralteration processor 112, in an embodiment.

FIG. 3 is a flow diagram of a method 300 for determining whether apacket is to be provided via a processing path to a programmable headeralteration processor or to be diverted to a bypass path that bypassesthe header alteration processor, according to an embodiment. In anembodiment, the method 300 is implemented by the input processing unit200 of FIG. 2. For example, at least a portion of the method 300 isperformed by the bypass decision engine 206 of the input processing unit200 of FIG. 2, in an embodiment. For ease of explanation, the method 300is described with reference to the input processing unit 200 of FIG. 2.However, the method 300 is implemented by a suitable processing unitdifferent from the input processing unit 200 of FIG. 2, in someembodiments.

At block 302, the input processor 200 (e.g., the bypass decision engine206) determines whether the thread ID associated with the packetindicates that the packet belongs to a bypass flow that is to bypass theprogrammable header alteration processor 112. For example, the bypassdecision engine 206 checks whether the tread ID associated with thepacket is equal to a particular value (e.g., thread ID=0) that indicatesthat the packet belongs to a bypass flow. If the bypass decision engine206 determines at block 302 that packet belongs a bypass flow, then themethod 300 terminates at block 304 with a decision that the packet is tobypass the programmable header alteration processor 112.

On the other hand, if the bypass decision engine 206 determines at block302 that the packet belongs to a processing flow, then the inputprocessor 200 (e.g., the first stage thread configuration unit 208)obtains further bypass configuration information, and the bypassdecision engine 206 proceeds to further determine whether the packet isto be diverted to the bypass path on the bypass configurationinformation, in an embodiment. To obtain the further bypassconfiguration information, in an embodiment, the first stage threadconfiguration unit 208 accesses a bypass thread configuration table 214in the memory 202 based on the thread ID associated with the packet, andretrieves from the bypass thread configuration table 214 one or morebypass attributes configured for the corresponding processing thread,such as i) a statistical factor configured for the correspondingprocessing thread, ii) a statistical processing mode configured for thecorresponding processing thread and iii) a stall mode configured for thecorresponding processing thread, in an embodiment.

The method 300 then proceeds to block 306, at which the bypass decisionengine 206 obtains (e.g., from a pseudorandom number generator includedin or coupled to the bypass decision engine 206) apseudorandom-generated number (PRGN), and compares the PRGN with thestatistical factor of the corresponding processing thread. Based on thecomparison between the PRGN and the statistical factor of thecorresponding processing thread, the bypass decision engine 206determines whether the packet is to be statistically selected to bypassthe programmable header alteration processor 112. For example, if thebypass decision engine 206 determines at block 306 that PRGN is greaterthan or equal to the statistical factor of the corresponding processingthread, this indicates that the packet is to be statistically selectedfor header alteration by the programmable header alteration processor112, and the method 300 proceeds to block 308, in an embodiment. On theother hand, if the bypass decision engine 206 determines at block 306that PRGN is less than the statistical factor of the correspondingprocessing thread, this indicates that the packet is to be statisticallyselected to bypass the programmable header alteration processor 112, andthe method 300 continues at block 310, in an embodiment.

At block 310, the bypass decision engine 206 determines whetherstatistical processing mode is enabled for the corresponding processingthread based on the statistical processing mode configured for thecorresponding processing thread. If the bypass decision engine 206determines at block 310 that the statistical processing mode is enabledfor the corresponding processing thread (e.g., if the statisticalprocessing mode is set to a logic 1), then the method 300 terminates atthe bypass block 304 thereby statistically diverting the packet to thebypass path. On the other hand, if the bypass decision engine 206determines at block 310 that statistical processing mode is not enabledfor the corresponding processing thread (e.g., if the statisticalprocessing mode is set to a logic 0), then the bypass decision engine206 does not statistically select the packet to be diverted to thebypass path, and the method 300 proceeds to a block 311 at which thethread ID corresponding to the packet is remapped to a “do nothing”thread, in an embodiment. The method 300 then returns to block 308, inan embodiment.

At block 308, the bypass decision engine 206 determines a congestionlevel of the programmable header alteration processor 112. In anembodiment, the bypass decision engine 206 determines the congestionlevel based on a feedback signal (e.g., a back-pressure signal) from theprogrammable header alteration processor 112. If the bypass decisionengine 206 determines that the programmable header alteration processor112 is not busy or that the congestion level of the programmable headeralteration processor 112 is sufficiently low, then the method 300terminates at block 318 with a decision to provide the packet to theprogrammable header alteration processor 112. On the hand, if the bypassdecision engine 206 determines that the programmable header alterationprocessor 112 is busy or that the congestion level of the programmableheader alteration processor 112 is not sufficiently low, then the method300 proceeds to blocks 312-316 at which further determinations are madeto determine whether or not the packet is to bypass the programmableheader alteration processor 112, in an embodiment.

At block 312, the bypass decision engine 206 determines, based on thestall mode configured for the corresponding processing thread, whetherthe processing thread allows packets to bypass header alteration if theprogrammable header alteration processor 112 is busy. If the bypassdecision engine 206 determines at block 312 that the processing threaddoes not allow packets to bypass header alteration (e.g., if threadstall mode ID is set to a logic 00), then the method 300 terminates atblock 318 with the decision that the packet is to be provided to theprogrammable header alteration processor 112. On the other hand, if thebypass decision engine 206 determines at block 312 that the processingthread allows packets to bypass header alteration processing (e.g., ifthread stall mode ID is not set to a logic 00), then the method 300proceeds to block 314 at which the bypass decision engine 206determines, based on the stall mode of the processing thread, whetherthe processing thread allows packets that bypass processing by theprogrammable header alteration processor 112 are to be dropped by thenetwork device 100. If the bypass decision engine 206 determines atblock 314 that the stall mode of the processing thread indicates thatpackets associated with the processing thread that do not undergo headeralteration by the programmable header alteration processor 112 are to bedropped by the network device 100, then the method 300 terminates at thebypass block 304 with an indication that the packet subsequently is tobe dropped by the network device 100.

On the other hand, if the bypass decision engine 206 determines at block314 that the stall mode of the processing thread indicates that packetsassociated with the processing thread that do not undergo processing bythe programmable header alteration processor 112 are not to be droppedby the network device 100, then the method 300 proceeds to block 316 atwhich the bypass decision engine 206 determines whether the stall modeof the processing thread indicates that packets that do not undergoprocessing by the programmable header alteration processor 112 i) arenonetheless to be provided to the programmable header alterationprocessor 112 to maintain proper order of packets associated with theprocessing thread as the packet flow through the programmable headeralteration engine 110 or ii) are to be redirected to the bypass path andnot provided to the programmable header alteration processor 112. If itis determined at block 316 that the packet is nonetheless to be providedto the programmable header alteration processor 112, then the method 300proceeds to block 317 at which the thread ID corresponding to the packetis remapped to a “do nothing” thread, in an embodiment. The packetmethod 300 then terminates at block 318 with the decision to provide thepacket to the programmable header alteration processor 112 so that thepacket flows through the programmable header alteration processor 112without being processed by the programmable header alteration processor112, in an embodiment. On the other hand, if it is determined at block316 that the packet is to be redirected to the bypass path and notprovided to the programmable header alteration processor 112, then themethod 300 terminates with the bypass decision at block 304 therebyredirecting the packet to the bypass path.

Referring again to FIG. 2, the input processor 200 also includes asecond stage configuration unit 210, a metadata generator 212 and aheader extractor 214 configured to perform addition pre-processingoperations with respect to packets that are not to bypass processing bythe programmable header alteration processor 112, in an embodiment. Thesecond stage configuration unit 210 is configured to access a secondstage thread configuration table 216 in the memory 202 based on thethread ID corresponding to the packet, and obtain thread configurationinformation to be used for processing of the packet by the programmableheader alteration processor 112, in an embodiment. Additionally oralternatively, the second stage configuration unit 210 is configured toaccess a source port configuration table 218 based on an indicator of asource port at which the packet was received and/or a target portconfiguration table 220 based on an indicator of a target port via whichthe packet is to be transmitted to obtain configuration information tobe used by the programmable header alteration processor 112 forprocessing packets received via the source port and/or to be transmittedvia the target port, in an embodiment. In other embodiments, othersuitable configuration information is obtained by the second stageconfiguration unit 210. The metadata generator 212 is configured togenerate metadata for the packet to include at least the configurationinformation obtained by the second stage configuration unit 210, in anembodiment.

The header extractor 214 is configured to extract one or more portionsof a packet header of the packet, and to generate an alterationprocessor accessible header to include the one or more portionsextracted from the packet header, in an embodiment. FIGS. 4A-B arediagrams illustrating, respectively, extraction of one or more portionsof a packet header 400 to generate an alteration processor accessibleheader 402, and generation of a processed packet header 450 after thealteration processor accessible header 400 is processed by theprogrammable header alteration processor 112, according to anembodiment. In an embodiment, the packet header 400 corresponds to anunaltered header of a received packet being processed by the networkdevice 100. In another embodiment, in which the programmable headeralteration engine 110 is preceded by a purely hardware header alterationengine coupled to the input of the programmable header alteration engine110, the packet header 400 corresponds to a packet header, of a receivedpacket, that is altered by the purely hardware header alteration engine,in at least some scenarios, in an embodiment. Referring to FIGS. 1 and2, the input processor 118 (e.g., the header extractor 208 in FIG. 2)generates the alteration processor accessible header 402, and the outputprocessor 122 subsequently generates the processed packet header 450after the alteration processor accessible header 402 is processed by theprogrammable header alteration processor 112, according to anembodiment. By way of example, the alteration processor accessibleheader extraction and processed header generation illustrated in FIGS.4A-B are described in the context of the network device 100 of FIG. 1.In other embodiments, the alteration processor accessible headerextraction and processed header generation illustrated in FIGS. 4A-B areperformed by network devices different from the network device 100 ofFIG. 1.

In an embodiment, the input processor 118 identifies an alterationprocessor accessible header portion 404 to be extracted from the packetheader 400 based on an accessible header anchor 406 that indicates abeginning location (e.g., a first bit) and a length of the singleportion to be extracted from the packet header 400, according to anembodiment. In an embodiment, the accessible header anchor 406 isassociated with a processing thread that is to be used to process thepacket header 400. In an embodiment, the input processor 118 obtains theaccessible header anchor 406 from the configuration memory 116 based onthe thread ID (e.g., determined by the thread identifier 204 of FIG. 2)associated with packet header 400. The input processor 118 extracts thealteration processor accessible header portion 404 beginning at thelocation indicated by the accessible header anchor 406 and of the lengthindicated by the accessible header anchor 406, and generates thealteration processor accessible header 402 to include the alterationprocessor accessible header portion 404 extracted from the packet header400, in an embodiment.

The input processor 118 provides the alteration processor accessibleheader 402 to the programmable header alteration processor 112 forprocessing, in an embodiment. Additionally, the input processor 118stores the entire packet header 400, in the unified bypass buffer 120,in an embodiment. In another embodiment, the input processor 118 storesa pre-accessible header portion 410 and a post accessible header portion412 of the packet header 412, rather than the entire packet header 400,in the unified bypass buffer 120.

Referring now to FIG. 4B, the programmable header alteration processor112 processes the alteration processor accessible header 402 to generatea post-processing alteration processor accessible header 454, in anembodiment. The post-processing alteration processor accessible header452 comprises the alteration processor accessible header 404 altered inaccordance with the processing thread with which the packet header 400is associated, in an embodiment. The post-processing alterationprocessor accessible header 452 is provided to the output processor 112,and the output processor 112 generates the processed packet header 450by incorporating the post-processing alteration processor accessibleheader 454 into the packet header 400, in an embodiment. For example,the output processor 122 retrieves the entire packet header 400 from theunified bypass buffer 120, and replaces the alteration processoraccessible header portion 404 with the post-processing alterationprocessor accessible header 454, in an embodiment. In anotherembodiment, the output processor 112 retrieves the pre accessible headerportion 410 and the post accessible header portion 412 from the unifiedbypass buffer 120, and generates the processed packet header 450 byproperly combining the pre accessible header portion 410 and the postaccessible header portion 412 with the post-processing alterationprocessor accessible header 454.

With continued reference to FIGS. 4A-B, although that alterationprocessor accessible header 402 is illustrated in FIG. 4A as includingonly a single alteration processor accessible header portion 404extracted from the packet header 400, the alteration processoraccessible header 402 includes multiple alteration processor accessibleportions 404 extracted from the packet header 400, in other embodiments.At least some of the multiple alteration processor accessible portions404 are non-contiguous portions of the packet header 400, in anembodiment. For example, the input processor 118 obtains (e.g., based ona thread ID associated with the packet header 400) multiple accessibleheader anchors 406, where respective ones of the multiple accessibleheader anchors indicate respective beginning locations and lengths ofrespective ones of multiple (e.g., non-contiguous) alteration processoraccessible portions 404 to be included in the alteration processoraccessible header 402. The input processor 118 then extracts themultiple alteration processor accessible portions 404 from the packetheader 400, and generates the alteration processor accessible header 402to include the multiple alteration processor accessible portions 402extracted from the packet header 400, in an embodiment. The programmableheader alteration processor 112 processes the alteration processoraccessible header 402 to generate post processing alteration processoraccessible header 452 that includes multiple post processing alterationprocessor accessible header portions, in an embodiment. The outputprocessor 122 then generates the processed packet header 450 by properlyincorporating the multiple post-processing alteration processoraccessible header portions included in the post processing alterationprocessor accessible header 452 into respective post processingalteration processor accessible header portions 454 of the processedpacket header 450, in an embodiment.

In some embodiments, in which the programmable header alteration engine110 is followed by a purely hardware header alteration engine, thepurely hardware header alteration engine then operates on the processedpacket header 450 to implement one or more additional header alterationoperations on the processed packet header 450.

FIG. 5 is a block diagram of a programmable header alteration processor500, according to an embodiment. In an embodiment, the programmableheader alteration processor 500 corresponds to the programmable headeralteration processor 112 of the network device 100 of FIG. 1, and theprogrammable header alteration processor 500 is described in the contextof the network device 100 of FIG. 1 for explanatory purposes. In otherembodiments, however, the programmable header alteration processor 500is used with another suitable network device other than the networkdevice 100 of FIG. 1.

The programmable header alteration processor 500 is configured toreceive packet headers from the input processor 118, and to implementcorresponding processing threads to process the packet headers. In anembodiment, the programmable header alteration processor 500 receivesalteration processor accessible headers (e.g., the alteration processoraccessible header 402 of FIG. 4A) extracted from packet headers, ratherthan the entire packet headers, and processes the alteration processoraccessible headers. For ease of explanation, the term “packet header” isused herein to refer to a packet header or an alteration processoraccessible header extracted from the packet header.

The programmable header alteration processor 500 includes a plurality ofpacket processing nodes (PPNs) 502 coupled to a memory 504 andconfigured to implement computer readable instructions stored in thememory 504. The PPNs 502 are arranged in packet processing node groups506, in the illustrated embodiment. PPNs 502 of a packet processing nodegroup 506 share certain resources, in an embodiment. For example,although only a single memory 504 is illustrated in FIG. 5, PPNs 502 ofrespective groups 506 are coupled to respective memories 504, in someembodiments.

The programmable header alteration processor 500 also includes adistributor 508 configured to distribute packet headers among the packetprocessing nodes 502 for processing of the packet headers, in anembodiment. The distributor 508 generally operates by distributingpacket headers to the PPNs 502 in the order in which the packet headersare received by the distributor 508, by cycling through the PPNs 502. Inan embodiment, the distributor 508 distributes packet headers to PPNsaccording to a distribution scheme that allows for packet headersassociated with processing threads that require relatively shorterprocessing times by the PPNs 502 to not be blocked or delayed by packetheaders associated with processing threads that require relativelylonger processing times by the PPNs 502. According to the distributionscheme, the distributor 508 distributes the packet headers by cyclingthrough the PPNs 502 in accordance with re-cycle numbers assigned toprocessing threads that are to be used for processing of the packetheaders, with smaller re-cycle numbers (e.g., re-cycle number of 1)being assigned to processing threads with relatively shorter processingtimes and larger re-cycle numbers (e.g., re-cycle number of 2, 3, 4,etc.) assigned to processing threads with relatively longer processingthreads. When the distributor 508 cycles through the PPNs 502, thedistributor 508 skips a PPN 502 during one or more cycles in accordancewith a re-cycle number assigned to a processing thread that is beingimplemented by the PPN 502, if the re-cycle number is greater than 1, inan embodiment. Thus, for example, when a packet header is associatedwith a processing thread with a re-cycle number of 2 is distributed to aparticular PPN, the distributor 508 skips over the particular PPN duringthe next distribution cycle, in an embodiment. Accordingly, during thenext distribution cycle, instead of distributing a packet header to theparticular PPN, the distributor 508 distributes the packet header toanother PPN 502 that is available for processing of the packet header,thereby avoiding blocking of the packet header by the relatively longerprocessing time of the processing thread being implemented by theparticular PPN 502, in an embodiment.

In an embodiment, in addition to a packet header distributed to a PPN502, the distributor 508 also provides a packet descriptor associatedwith the corresponding packet and metadata generated by the inputprocessor 118, in an embodiment. The programmable header alterationprocessor 500 also includes a plurality of splitters 510 configured tobreak up packet headers, packet descriptors and metadata to be providedto a PPN 502 into chunks so that the information is transferred to thePPN 502 in chunks, in an embodiment. Transferring information to PPNs502 in chunks is performed using suitable narrow interfaces (e.g.,busses) between the splitters 510 and the PPNs 502 while still allowingfor transfer of relatively large amounts of header information (e.g.,selected header portion) and metadata corresponding to a packet header.The respective chunks of information, including e.g., a packet header, apacket descriptor and metadata, associated with a packet, are of a sizethat corresponds to, or is narrower than, an interface (e.g., a bus)between the splitter 510 and the PPN 502, in an embodiment. For example,a bit stream of information that includes metadata corresponding to apacket, the packet header, and the packet descriptor is broken up intochunks as the bit stream enters the splitter 510, where each chunkcorresponds to or is narrower than the interface (e.g., the bus) betweenthe splitter 510 and the PPN 502, in an embodiment. Transferringinformation to PPNs 502 in chunks also allows the chunks of informationcorresponding to a packet to be transferred to a PPN 502 concurrentlywith chunks of information already processed by the PPN 502 (e.g.,chunks of a packet header, a packet descriptor, metadata, etc.corresponding to a previous a previous packet header processed by thePPN 502) is transferred out from the PPN 502, in an embodiment.

Although only a single splitter corresponding to each PPN group 506 isillustrated in FIG. 5, the programmable header alteration processor 500includes multiple splitters 510 corresponding to each of the packetprocessing node groups 506, in some embodiments. Respective ones of thesplitters 510 operate in parallel to break up and transfer packetheaders, packet descriptors, and metadata, to different ones of the PPNs502, in an embodiment. In an embodiment, each particular splitter 510,at any given time, operates on a packet header, packet descriptor, andmetadata corresponding to a particular packet, and generates chunks ofthe packet header, packet descriptor, and metadata corresponding to theparticular packet. The chunks generated by the splitter 510 are providedto a particular PPN 502 for processing of the particular header, in anembodiment. In this manner, in an embodiment, chunks of information(packet header, packet descriptor, and metadata) corresponding to aparticular packet are transferred to a particular PPN serially, enablingthe transfer of relatively large among of information via a suitablynarrow communication bus, in an embodiment. On the other hand, chunks ofinformation (packet header, packet descriptor, and metadata)corresponding to different packets are transferred to different PPNs 102in parallel, in an embodiment.

In an embodiment, a splitter 510 causes at least a portion of metadatacorresponding to a packet header to be transferred to a PPN 502 prior totransferring corresponding header information to the PPN 502. Forexample, an initial chunk (e.g., the first chunk) corresponding to apacket header transferred from the splitter 610 to the PPN 502 includesat least the portion of the metadata, in an embodiment. The at least theportion of the metadata transferred to the PPN 502 includes a thread IDassociated with the packet header, or other indicator of a processingthread that is to be used by the PPN 502 to process the packet header,such as a pointer to a memory location at which a beginning of theprocessing thread is stored in the memory 504, in an embodiment.Accordingly, the PPN 502 receives the thread ID or other the indicatorof the processing thread prior to receiving the header information thatis to be processed using the processing thread, in an embodiment. In anembodiment, the PPN 502 retrieves at least initial instructionscorresponding to the processing thread from the memory 504 prior toreceiving the header information. The PPN 502 then begins implementingthe processing thread immediately upon receiving the header information,in an embodiment. In an embodiment, the PPN 502 begins implementing theprocessing thread immediately upon receiving a first chunk that includesheader information.

The programmable header alteration processor 500 also includes aplurality of accumulators 512 and an aggregator 514. Chunks of packetheaders that are processed by a PPN 502 are transferred to anaccumulator 512 that assembles the chunks, in an embodiment. Althoughonly a single accumulator 512 corresponding to each packet processingnode group 506 is illustrated in FIG. 5, the programmable headeralteration processor 500 generally includes multiple accumulators 512corresponding to each of the packet processing node groups 506, in anembodiment. Respective ones of accumulators 510 operate in parallel toreceive and assemble processed header chunks from different ones of thePPNs 502, in an embodiment.

The aggregator 514 aggregates packet headers distributed to the PPN 502into processed packet header streams, in an embodiment. To maintain theorder of packet headers processed by the PPNs 502 as the packet headersgo through the programmable header alteration processor 500, theaggregator 514 utilizes an aggregation scheme that corresponds to thedistribution scheme used by the distributor 506, in an embodiment.According to the aggregation scheme, the aggregator 514 aggregates thepacket headers by cycling through the PPNs 502 in accordance with there-cycle numbers that were used by the distributor 506 to distribute thepacket headers. Thus, when the aggregator 514 cycles through the PPNs502, the aggregator 514 skips a PPN 502 during one or more cycles inaccordance with a re-cycle number that was used by the distributor 506when distributing the packet headers, thereby maintaining the order ofthe packet headers, in an embodiment.

The programmable header alteration processor 500 operates on a fasterclock than a core clock used to operate the remainder of the packetprocessor 104, including components of the programmable headeralteration engine 110 other than the programmable header alterationprocessor 500. Accordingly, the header alteration processor 500 includesan input synchronization component, such as a sync FIFO 520, and anoutput synchronization component, such as a sync FIFO 522, tosynchronize between the two clock domains, in an embodiment. Operatingthe programmable header alteration processor 500 on a faster clockrelative to the core clock allows the programmable header alterationprocessor 500 to implement relatively longer processing threads toperform header alteration for packets flowing through the packetprocessor 110 without slowing down the traffic flowing through thepacket processor 110 in at least some situations, in an embodiment.

With continued reference to FIG. 5, during processing of the packetheaders, PPNs 502 engage one or more hardware accelerator engines 530 toperform specific processing operations on the packet headers, in anembodiment. Although a single set of hardware accelerator engines 530shared by the PPNs 502, the programmable header alteration processor 500includes multiple sets of header alteration 530, in some embodiments.For example, each PPN 502 includes a respective set of hardwareaccelerator engines 530 implemented in hardware, in an embodiment. In anembodiment, the hardware accelerator engines 530 are implemented usingone or more integrated circuits configured to operate as describedherein. In other embodiments, the hardware accelerator engines 530 areimplemented in other suitable manners.

The one or more hardware accelerator engines 530 are configured toperform respective specific operations that would take significantlylonger (e.g., twice as long, four times as long, one hundred times aslong, etc.) to be performed via computer readable instructions, in anembodiment. For example, the one or more hardware accelerator engines530 are configured to perform respective information shifting operationto shift fields in packet headers, for example for insertion of tagsinto the packet headers and/or for removal of tags from the packetheaders. As a more specific example, in an embodiment, the one or morehardware accelerator engines 530 include a bit shift operationaccelerator engine 530-1 configured to shift fields in the packetheaders by given numbers of bits, and a byte shift operation acceleratorengine 530-2 configured to shift fields in packet headers by givennumbers of bytes, in an embodiment. As another example, the acceleratorengine 530 include a checksum accelerator engine 530-3 that isconfigured to calculate an incremental update to a checksum field of apacket header, in an embodiment. In other embodiments, the acceleratorengines 530 additionally or alternatively include accelerator engines530 configured to perform other suitable operations. In at least someembodiments, engaging hardware accelerator engines 530 to perform theprocessing operations allows a PPN 502, for example, more quickly inserta tag (e.g., a forwarding tag) into packet header and/or to more quicklystrip a tag (e.g., a forwarding tag) from a packet header as compared toan implementation in which a processor executing computer readableinstructions does not engage hardware engines to perform the operations(e.g., the information shifting operations). In at least someembodiments, engaging hardware accelerator engines 502 to perform theprocessing operations permits the PPN 502 to perform header alterationprocessing of packet headers at full-wire speed at which the networkdevice 100 receives packets.

As discussed above, in an embodiment, the checksum accelerator engine530-3 is configured to calculate an incremental update to a checksumfield of a packet header, in an embodiment. In an embodiment, a PPN 502that is processing a packet header of a packet is configured to triggerthe checksum accelerator engine 530-3 to calculate a checksum based on abit string extracted to from the packet header. The bit stringcorresponds to a particular portion of the packet header, depending, forexample, on the type of packet to which the packet header corresponds,in an embodiment. As an example, in an embodiment and/or scenario, thebit string corresponds to an IP header, such as an IPv4 or an IPv6header, included in the packet header, in an embodiment. In this case,the PPN 502 triggers the checksum accelerator engine 530-3 to generate achecksum to be included in a checksum field of the IP header, in anembodiment. As another example, in another embodiment and/or scenario,the bit string corresponds to a UDP header included in the packetheader, and the PPN 502 triggers the checksum accelerator engine 530-3to generate a checksum to be included in a checksum field of the UDPheader. As yet another example, in another embodiment and/or scenario,the bit string corresponds to a generic UDP encapsulation (GUE) headerincluded in the packet header, and the PPN 502 triggers the checksumaccelerator engine 530-3 to generate a checksum to be included in achecksum field of the GUE header. In other embodiments, the bit stringcorresponds to other suitable portions of the header.

In an embodiment, to trigger the checksum accelerator engine 530-3, thePPN 502 is configured to partition the bit string into a plurality ofsegments, and to generate respective trigger instructions to seriallytransfer respective ones of the segments to the checksum acceleratorengine 530-3. As an example, the PPN 502 is configured to partition thebit string into up to four segments, each segment comprising 16 bits,and to generate respective trigger instructions to serially transferrespective ones of the 16-bit segments to the checksum acceleratorengine 530-3. In other embodiments, other suitable numbers of segmentsand/or segments of other suitable sizes are utilized. The checksumaccelerator engine 530-3 is configured to add the respective segments toeach other in stages, and to accumulate the calculated sum, in anembodiment. For example, the checksum accelerator engine 530-3 includesone or more adders (e.g., serial adders, such as 16-bit serial adders)and an accumulator, in an embodiment. The one or more adders areconfigured to incrementally determine a sum of the respective segmentsin multiple summation stages by providing a sum generated in each stageto the accumulator, and subsequently summing a subsequent one of thesegments with the current sum in the accumulator, in an embodiment. Inat least some embodiments, because respective segments of a bit stringare serially transferred to the checksum accelerator engine 530-3 andare used to incrementally generate a checksum by the checksumaccelerator engine 530-3, a smaller checksum accelerator engine 530-3,in terms of size, power consumption, etc., can be used as compared to asystem in which an entire bit string is provided in a same clock cycleto a checksum engine, particularly when relatively long bit strings needchecksum calculation.

In some embodiments, the checksum accelerator engine 530-3 is configuredto wrap one or more “carry” in the sum generating by a calculation in anadder into the accumulator in a same clock cycle in which thecalculation is performed, thereby eliminating the need to perform afinal wrap-around operation after the checksum calculation is completed,in an embodiment. When the checksum accelerator engine 530-3 completesgeneration of the checksum, the checksum accelerator engine 530-3outputs the checksum, for example by providing the checksum back to thePPN 502 or by writing the checksum to a memory location indicated by thePPN 502. In an embodiment, the PPN 502 is configured to provide, to thechecksum accelerator engine 530-3, an indicator (e.g., a destinationpointer) of a memory location (e.g., in a packet register file) to whichthe calculated checksum is to be written. For example, the PPN 502 isconfigured to include the indicator in one or more trigger instructionsthat are used to transfer the respective segments to the checksumaccelerator engine 530-3, in an embodiment. When the checksumcalculation is completed by the checksum accelerator engine 530-3, thePPN 502 writes the checksum to the indicated memory location, therebyplacing the calculated checksum into an appropriate checksum field inthe packet header, in an embodiments.

In some embodiments, the checksum accelerator engine 530-3 is configuredto invert the checksum prior to outputting the checksum in at least somesituations. For example, the checksum accelerator engine 530-3 isconfigured to invert the calculated checksum when the checksum isgenerated based on an IP header, or, in at least some situations, basedon a UDP header, in an embodiment. In an embodiment, the checksumaccelerator engine 530-3 is configured to determine whether or not toinvert a calculated checksum based on an instruction included in (e.g.,encoded into) one or more trigger instructions that are used to transferthe respective segments to the checksum accelerator engine 530-3. Forexample, the PPN 502 is configured to encode a trigger instruction as a“load new checksum” instruction to indicate that no inversion of thecalculated checksum is required, or to encode the trigger instruction asa “load old checksum” instruction to indicate that inversion of thecalculated checksum is required, in an embodiment. In some embodimentsand/or scenarios, the checksum accelerator engine 530-3 additionally oralternatively determines whether or not to invert a calculated checksumbased on the value of the calculated checksum. For example, for achecksum calculated based on a UDP header, the checksum acceleratorengine 530-3 is configured to invert the calculated checksum if thevalue of the calculated checksum is 0xFFF, in an embodiment.

FIG. 6 is a block diagram of an example packet processing node 600included in a header alteration processor, according to an embodiment.In an embodiment, the packet processing node 600 corresponds to packetprocessing nodes 502 of the programmable header alteration processor 500of FIG. 5, and the packet processing node is in the context of theprogrammable header alteration processor 500 of FIG. 5 for explanatorypurposes. In other embodiments, however, the memory access system 600 isused with another suitable processor device other than the programmableheader alteration processor 500 of FIG. 5.

The packet processing node 600 includes a processor 601, a scratch pad(e.g., a data RAM) 602 and a packet register file 603. The processor 601is generally configured to implement a processing thread retrieved fromthe memory 504 to process a packet header provided to the processingnode 600, according to an embodiment. In an embodiment, the processor601 is a small general purpose CPU or another suitable processorconfigured to execute computer readable instructions stored in thememory 504. The packet processing node 600 additionally includes aplurality of accelerator engines 604 and a packet register file accesssystem 605. The accelerator engines 604 correspond to the acceleratorengines 530, in an embodiment. In another embodiment, the acceleratorengines 604 additionally or alternatively include accelerator enginesdifferent from the accelerator engine 530. The packet register fileaccess system 605 allows the processor 601 to use native CPUinstructions, such as native CPU read/write instructions, to bothtrigger accelerator engines 604 and provide regular access to the packetregister file 603 for read/write operations not related to acceleratorengines 604, in an embodiment.

The packet register file 604 is configured to store packet informationprovided to the packet processing node 600 by the distributor 508, suchas at least a portion of a packet header of a packet, a descriptorcorresponding to the packet, metadata generated for the packet by theinput processor 200 (FIG. 2), etc. The packet register file 603 isconfigured to also store various data generated during implementation ofa processing thread with respect to the packet. The packet register file603 is additionally configured to store data generated by theaccelerator engines 604, such as portions of packet headers processed bythe accelerator engines 604, in an embodiment. The packet register file603 comprises, for example, an array of registers and/or a multi-portrandom access memory (RAM), in an embodiment. In other embodiments, thepacket register file 603 is implemented in other suitable manners.

The packet register file access system 605 includes an input circuit 610and an instruction decoder 614, in an embodiment. In operation,instructions 611 issued by the processor 601 are provided to the inputcircuit 610. The instructions 611 comprise read or write instructions,in an embodiment. The read or write instructions are encoded asaccelerator engine instructions, in some situations, and are encoded asregular read or write instruction in other situations, in an embodiment.For example, the processor 601 is configured to encode a readinstruction as an accelerator trigger instruction to trigger anaccelerator engine 604 to perform an operation, in an embedment. Asanother example, the processor 601 is configured to encode a writeinstruction as an accelerator write instruction to dynamically writeaccelerator engine operand data to the packet register file 603 prior totriggering an accelerator engine 604 to perform an operation. In otherembodiments, the instructions 611 comprise suitable types ofinstructions other than write/read instructions encoded as acceleratorinstructions.

The input circuit 610 includes circuitry configured to perform hazardresolution to resolve situations in which a read or write instruction isprovided to the memory access system 600 concurrently (e.g., in a sameclock cycle) with another read or write instruction or while the memoryaccess system 600 is busy handling a previously issued read or writeinstruction. In an embodiment, the input circuit 602 includes a buffer(not shown) that temporarily buffers one or more instructions until thememory access system 600 is able to handle the one or more instructions.

The input circuit 610 writes the instruction 611 to the scratch padmemory 602, in an embodiment, and the instruction decoder 614 decodesthe instruction written to the scratch pad memory 602. In an embodiment,the instruction decoder 614 decodes the instruction 611 based on amemory map used by the processor 601, where the memory map includesmappings of both regular read/write instructions and read/writeinstructions encoded as accelerator instructions, according to anembodiment. FIG. 7 is a block diagram of an example memory map 700 usedby the processor 601, according to an embodiment. The memory map 700includes an accelerator instruction mapping 702 that maps instructionsaddressed to addresses in a first range of addresses to acceleratorinstructions, in an embodiment. Thus, for example, read/writeinstructions having addresses in the range of 0000 1000 to 0000 1FFF mapto accelerator instructions, according to the memory map 700 in theillustrated embodiment. The memory map 700 additionally includes apacket register file mapping 704 that map instructions addressed toaddresses in a second range of addresses to regular packet register fileread/write instructions, in an embodiment. For example, read/writeinstructions having addresses in the range of 0000 2000 to 0000 23FF mapto regular packet register file instructions, according to the memorymap 700 in the illustrated embodiment.

Referring again FIGS. 6 and 7, in an embodiment, when the address of theinstruction 611 is in the range of addresses mapped by the memory map700 to accelerator instructions, the instruction decoder 614 decodes theinstruction 611 as an accelerator instruction, such as a triggerinstruction or an accelerator write instruction, in an embodiment. In anembodiment, the accelerator instruction 611 is a 32-bit processorinstruction, and the accelerator trigger instruction 611 is described inthe context of a 32-bit processor instruction for exemplary purposes. Inother embodiments, however, other suitable processor instructions (e.g.,16-bit instructions, 64-bit instructions) are utilized. In anembodiment, bits [31:26] of the accelerator instruction are encoded toindicate a type of operation being requested by the acceleratorinstruction and, accordingly, a type of accelerator engine beingtriggered by the instruction. As an example, bits [31:26] of theaccelerator instruction 611 are encoded to indicate that the acceleratorinstruction 611 is one of i) a byte copy instruction requesting a bytefield copy operation, ii) a load new checksum instruction requesting anew checksum load operation, iii) a load old checksum instructionrequesting an old checksum load operation, iv) a bit field copyinstruction requesting a bit field copy operation, v) a bit field addinstruction requesting a bit field add operation, vi) a bit fieldsubtract instruction requesting a bit field subtract operation, etc. Theremaining bits [25:0] of the accelerator instruction 611 are encoded toindicate locations of source and/or destination header fields,descriptor fields or configuration registers in the packet register file603, in an embodiment. Example encoding of instructions, according tosome embodiments, are described in more detail below with reference toFIGS. 8A-B.

The instruction decoder 614 determines whether the instruction 611 mapsto an accelerator instruction or to a regular read/write instruction, inan embodiment. If the instruction 611 maps to an acceleratorinstruction, the instruction decoder 614 decodes the instruction as anaccelerator trigger instruction or an accelerator write instruction, andextracts indications of relevant source/destination locations in thepacket register file 603 that are source/destination fields of thecorresponding instruction, in an embodiment. The accelerator triggerinstructions decoded by the instruction decoder 604 and the regular CPUread instructions issued by the processor 601 are provided to a junction620, in an embodiment. Similarly, the accelerator write instructionsdecoded by the instruction decoder 604 and regular write instructionsissued by the processor 601 are provided to a junction 630. In anembodiment, to equalize time between issue of an accelerator instructionand the time the corresponding decoded accelerator instruction isprovided to the junction 620/630 and the time between issue of a regularCPU read instruction and the time the regular CPU instruction isprovided to the junction 620/630, the decoded accelerator instructionand/or the regular CPU instructions are provided to the junction 620/620via one or more delay lines, such as delay lines 618/628 in FIG. 6.

The decoded accelerator trigger instruction controls a multiplexer 622to select appropriate operand data for execution of the instruction. Inan embodiment, the multiplexer 611 selects between operand data readfrom the packet register file 603 and operand data obtained from anaccelerator engine 604, allowing for direct use of data generated by theaccelerator engine 604 in a previously triggered operation for executionof the current accelerator trigger instruction, in an embodiment. Datagenerated by the accelerator engine 604 is also written to the packetregister file 603 when the packet register file 603 contains thedestination of the executed operation, as indicated by the acceleratortrigger instruction 611, in an embodiment. Similarly, data requested tobe written to the packet register file 603 by a regular CPU writeinstruction is written to the packet register file 603, in anembodiment. Further, data read from the packet processor file 603 inresponse to a regular CPU read instruction is provided directly to theprocessor 601, in an embodiment.

FIGS. 8A-B are diagrams illustrating several example acceleratorinstructions that correspond to instructions 611 issued by the processor601, according to embodiments. FIG. 8A is a diagram of an example bytefield operation instruction 800, according to an embodiment. FIG. 8Aillustrates example bit allocations used in the byte field operationinstruction 800. In other embodiments, suitable bit allocationsdifferent from the bit allocation illustrated in FIG. 8A are utilized.

In an embodiment, bits [31:26] of the instruction 800 are encoded toindicate a type of a byte field operation being triggered by theinstruction 800. For example, a first value (e.g., 000000) of the bits[31:26] of the instruction 800 indicates that a byte field copyoperation is triggered, a second value (e.g., 000001) of the bits[31:26] of the instruction 800 indicates that a checksum load operationis triggered, etc., in various embodiments. Bits [25:21] of theinstruction 800 are encoded to indicate a subtype of the byte fieldoperation, in some situations, in an embodiment. For example, when bits[31:26] of the instruction 800 are set to indicate a checksum loadoperation, bits [31:26] of the instruction 800 are set to indicatewhether the instruction 800 is a load new checksum instruction forloading a newly calculated checksum into a packet header checksum fieldor a load old checksum instruction for loading an old checksum into apacket header checksum field, in an embodiment.

With continued reference to FIG. 8A, bits [20:15] of the instruction 800are encoded to indicate a length of the destination/source fields of theoperation, in an embodiment. In an embodiment, for example, bits [20:15]are set to a value corresponding to the length of the field minus one(Length-1). Bits [15:8] of the instruction 800 are encoded to indicate alocation of the destination field to which the operation is to beapplied, in an embodiment. In an embodiment, bit 15 of the instruction800 is encoded to indicate whether the destination field of theoperation is a packet header field or a non-packet header field, such asa packet descriptor field or other packet register field. Further, bits[14:8] of the instruction 800 are encoded to indicate an index of afirst byte of the source field in a packet header in the packet registerfile 603, an index of a first byte of the source field in a packetdescriptor in the packet register file 603, or an index of a first bytein a configuration register in the packet register file 603, inaccordance with the indication encoded in bit 15, in an embodiment. Bits[7:0] of the instruction 800 are encoded to indicate a location of asource field from which source data of the operation is to be obtained,in an embodiment. In an embodiment, bit 7 of the instruction 800 isencoded to indicate whether the source field of the operation is apacket header field or a non-packet header field, such as a packetdescriptor field or other packet register field. Further, bits [6:0] ofthe instruction 800 are encoded to indicate an index of a first byte ofthe source field in a packet header in the packet register file 603, anindex of a first byte of the source field in a packet descriptor in thepacket register file 603, or an index of a first byte in a configurationregister in the packet register file 603, in accordance with theindication encoded in bit 7, in an embodiment.

FIG. 8B is a diagram of an example bit field operation instruction 850,according to an embodiment. FIG. 8B illustrates example bit allocationsused in the bit field operation instruction 850. In other embodiments,suitable bit allocations different from the bit allocation illustratedin FIG. 8B are utilized. In an embodiment, bits [31:26] of theinstruction 850 are encoded to indicate a type of a bit field operationbeing triggered by the instruction 850. For example, a first value(e.g., 000001) of the bits [31:26] of the instruction 850 indicates abyte field copy operation is triggered, a second value (e.g., 000010) ofthe bits [31:26] of the instruction 850 indicates a bit field addoperation is triggered, a third value (e.g., 000011) of the bits [31:26]of the instruction 850 indicates a bit field subtract operation istriggered etc., in various embodiments. Bits [25:23] of the instruction850 are encoded to indicate a beginning location of a set of destinationbits of the operation, and bits [25:23] of the instruction 850 areencoded to indicate a beginning location of a set of source bits of theoperation, in an embodiment. Bits [19:16] are encoded to indicate alength (e.g., a number of bits) of the set of destination and sourcebits of the operation, in an embodiment.

With continued reference to FIG. 8B, bits [15:0] of the instruction 850are encoded to indicate a byte location of the destination/source fieldthat includes the set of bits indicated by the bits [25:16], in anembodiment. In an embodiment, bit 15 of the instruction 850 is encodedto indicate whether the destination/source field of the operation is apacket header field or a non-packet header field, such as a packetdescriptor field or other packet register field. Further, bits [14:8] ofthe instruction 850 are encoded to indicate an index of a first byte ofthe source/destination field in a packet header in the packet registerfile 603, an index of a first byte of the source/destination field in apacket descriptor in the packet register file 603, or an index of afirst byte in a configuration register in the packet register file 603,in accordance with the indication encoded in bit 15, in an embodiment.Similarly, bits [7:0] of the instruction 850 are encoded to indicate alocation of a source field from which source data of the operation is tobe obtained, in an embodiment. In an embodiment, bit 7 of theinstruction 850 is encoded to indicate whether the source field of theoperation is a packet header field or a non-packet header field, such asa packet descriptor field or other packet register field. Further, bits[6:0] of the instruction 850 are encoded to indicate an index of a firstbyte of the source field in a packet header in the packet register file603, an index of a first byte of the source field in a packet descriptorin the packet register file 603, or an index of a first byte in aconfiguration register in the packet register file 603, in accordancewith the indication encoded in bit 7, in an embodiment.

FIG. 9 is a flow diagram illustrating an example method 900 forprocessing packets in a network device, according to an embodiment. Inan embodiment, the network device 100 implements the method 900 toprocess a packet received by the network device 100. Thus, the method900 is described with reference to the network device 100 merely forexplanatory purposes. In other embodiments, the method 900 isimplemented by another suitable network device.

At a block 902, a packet received via a port of a network device isreceived by a packet processor of the network device. The packetreceived at block 902 includes a packet header and a payload, in anembodiment. In an embodiment, the packet is received by the packetprocessor 104 of the network device 100 of FIG. 1. In anotherembodiment, the packet is received by a suitable packet processordifferent from the receive processor 104 of the network device 100 ofFIG. 1 and/or is received by a packet processor of a network devicedifferent from the network device 100 of FIG. 1.

At block 904, at least one egress interface via which the packet is tobe transmitted by the network device is determined. In an embodiment,the forwarding engine 106 of the packet processor 104 of FIG. 1determines the at least one egress interface via which the packet is tobe transmitted. In another embodiment, another suitable device (otherthan the forwarding engine 106) determines the at least one egressinterface via which the packet is to be transmitted.

At block 906, at least a packet header of the packet is provided to aprogrammable header alteration engine that includes i) a hardware inputprocessor implemented in hardware and ii) a programmable headeralteration processor coupled to a program memory and configured toexecute computer readable instructions stored in the program memory toperform one or more header alteration operations on received packets. Inan embodiment, the at least the packet header is provided to theprogrammable header alteration engine 110 of the packet processor 104 ofFIG. 1. In this embodiment, the hardware input processor of theprogrammable header alteration engine corresponds to the input processor118 of the programmable header alteration engine 110, and theprogrammable header alteration processor of the header alteration enginecorresponds to the programmable header alteration processor 112. Inother embodiments, the at least the packet header is provided to aprogrammable header alteration engine different from the programmableheader alteration engine 110.

At block 908, it is determined whether the packet header is to beprovided to a processing path coupled to the programmable headeralteration processor or to be diverted to a bypass path that bypassesthe programmable header alteration processor. In an embodiment, thehardware input processor of the programmable header alteration enginedetermines whether the packet header is to be provided to a processingpath coupled to the programmable header alteration processor or to bediverted to a bypass path that bypasses the programmable headeralteration processor. In an embodiment, the bypass decision engine 206of the input processor 200 of FIG. 2 determines whether the packet is tobe provided via a processing path to the programmable header alterationprocessor or to be diverted to a bypass path that bypasses the headeralteration processor. In an embodiment, the method 300 of FIG. 3 isimplemented to determine whether the packet is to be provided via aprocessing path to the programmable header alteration processor or to bediverted to a bypass path that bypasses the header alteration processor.In other embodiments, the determination at block 908 is made in othersuitable manners.

At block 910, the packet header is provided to the processing path or tothe bypass path based on the determination, made at block 909, ofwhether the packet header is to be provided to the processing path or tobe diverted to the bypass path. In an embodiment, the hardware inputprocessor of the programmable header alteration engine provides thepacket header to the processing path or to the bypass path based on thedetermination, made at block 909, of whether the packet header is to beprovided to the processing path or to be diverted to the bypass path.

At block 912, the packet header is selectively i) processed by theprogrammable header alteration processor when the packet header isprovided to the processing path or ii) not processed by the programmableheader alteration processor when the packet header is provided to thebypass path. In an embodiment, the programmable header alterationprocessor 112 of FIG. 1 or the programmable header alteration processor500 of FIG. 5 selectively processes or does not process the packetheader at block 912. In another embodiment, another suitableprogrammable header alteration processor selectively processes or doesnot process the packet header at block 912.

Generally, because the method 900 allows packets to dynamically bypassthe header alteration processor, the method 900 allows packets that maynot require header alteration by the programmable header alterationprocessor to quickly pass through the programmable header alterationengine, in an embodiment. For example, certain packet flow and/or packettypes that do not require header alteration by the programmable headeralteration engine are allowed to quickly pass through the programmableheader alteration engine, in an embodiment. In some embodiments, certainpackets and/or packet flows are allowed to bypass the header alterationprocessor during times of congestion in the header alteration processor.In at least some embodiments, allowing some packets to dynamicallybypass the programmable header alteration processor increases processingpower available for processing packets that require header alteration bythe programmable header alteration processor as compared to systems thatdo not utilize dynamic bypass of a header alteration processor.

At block 914, the packet with the packet header selectively processed ornot processed at block 910 is transmitted via the at least one egressinterface of the network device determined at block 904.

FIG. 10 is a flow diagram illustrating an example method 100 forprocessing packets in a network device, according to another embodiment.In an embodiment, the network device 100 implements the method 1000 toprocess a packet received by the network device 100. Thus, the method1000 is described with reference to the network device 100 merely forexplanatory purposes. In other embodiments, the method 1000 isimplemented by another suitable network device.

At a block 1002, a packet received via a port of a network device isreceived by a packet processor of the network device. The packetreceived at block 902 includes a packet header and a payload, in anembodiment. In an embodiment, the packet is received by the packetprocessor 104 of the network device 100 of FIG. 1. In anotherembodiment, the packet is received by a suitable packet processordifferent from the receive processor 104 of the network device 100 ofFIG. 1 and/or is received by a packet processor of a network devicedifferent from the network device 100 of FIG. 1.

At block 1004, at least one egress interface via which the packet is tobe transmitted by the network device is determined. In an embodiment,the forwarding engine 106 of the packet processor 104 of FIG. 1determines the at least one egress interface via which the packet is tobe transmitted. In another embodiment, another suitable device (otherthan the forwarding engine 106) determines the at least one egressinterface via which the packet is to be transmitted.

At block 1006, a packet header of the packet is processed with aprogrammable header alteration processor coupled to a program memory andconfigured to execute computer readable instructions stored in theprogram memory to perform one or more header alteration operations onreceived packets. In an embodiment, the packet header is processed bythe programmable header alteration processor 112 of FIG. 1. In anotherembodiment, the packet header is processed by a suitable programmableheader alteration processor different from the programmable headeralteration processor 112 of FIG. 1. In an embodiment, processing thepacket header at block 1006 includes triggering a hardware checksumaccelerator engine to calculate a checksum for a bit stringcorresponding to at least a portion of the packet header. In anembodiment, processing the packet header at block 1006 includestriggering the hardware checksum accelerator engine 530-3 of FIG. 5. Inanother embodiment, processing the packet header at block 1006 includestriggering a suitable hardware checksum accelerator engine differentfrom the hardware checksum accelerator engine 530-3 of FIG. 5. In anembodiment, triggering the hardware checksum accelerator engine includesi) partitioning the bit string into a plurality of segments of the bitstring and ii) transferring the plurality of segment of the bit stringto the hardware checksum accelerator engine.

At block 1008, the checksum is incrementally calculated by the hardwarechecksum accelerator engine at least by incrementally summing therespective segments of the plurality of segments of the bit string. Inan embodiment, incrementally summing the respective segments of theplurality of segments at block 1008 includes summing the plurality ofsegments in multiple summation stages. In an embodiment, incrementallysumming the respective segments of the plurality of segments at block1008 is performed in the manner described above in connection with thehardware checksum accelerator engine 530-3 of FIG. 5. In anotherembodiment, incrementally summing the respective segments of theplurality of segments at block 1008 is performed in other suitablemanners. In an embodiment, incrementally summing the plurality ofsegments in multiple summation stages storing a result of a previoussummation stage in an accumulator, the result of the previous summationstage corresponding to a sum of two or more segments, among theplurality of segments, transferred to the hardware checksum acceleratorengine, and in a subsequent summation stage, summing the result of theprevious summation stage, stored in the accumulator, with a subsequentsegment, among the plurality of segments, transferred to the hardwarechecksum accelerator engine.

In at least some embodiments, because respective segments of a bitstring are serially transferred to the hardware checksum acceleratorengine at block 1006 and are used to incrementally calculate a checksumby the hardware checksum accelerator engine at block 1008, the method1000 allows for a smaller hardware checksum accelerator engine, in termsof size, power consumption, etc., to be used as compared to a system inwhich an entire bit string is provided in a same clock cycle to achecksum engine and the checksum is calculated on the entire bit string,particularly when relatively long bit strings need checksum calculation.

At block 1010, the packet with a modified packet header that includesthe checksum incrementally calculated at block 1008 is transmitted viathe at least one egress interface of the network device determined atblock 1004.

In an embodiment, a method for processing packets in a network deviceincludes: receiving, at a packet processor of the network device, apacket received by the network device from a network link; determining,with the packet processor, at least one egress interface via which thepacket is to be transmitted by the network device; providing at least apacket header of the packet to a programmable header alteration engineof the packet processor, the programmable header alteration engineincluding i) a hardware input processor implemented in hardware and ii)a programmable header alteration processor coupled to a program memory,the programmable header alteration processor being configured to executecomputer readable instructions stored in the program memory to performone or more header alteration operations on received packets;determining, with the hardware input processor of the programmableheader alteration engine, whether the packet header is to be provided toa processing path coupled to the programmable header alterationprocessor or to be diverted to a bypass path that bypasses theprogrammable header alteration processor; providing, with the hardwareinput processor of the programmable header alteration engine, the packetheader to the processing path or to the bypass path based on thedetermination of whether the packet header is to be provided to theprocessing path or to be diverted to the bypass path; selectively i)processing the packet header by the programmable header alterationprocessor when the packet header is provided to the processing path andii) not processing the packet header by the programmable headeralteration processor when the packet header is provided to the bypasspath; and transmitting, with the network device, the packet via the atleast one egress interface of the network device.

In other embodiments, the method also includes one of, or any suitablecombination of two or more of, the following features.

The method further comprises determining, with the packet processor, apacket flow to which the packet belongs.

The method further comprises determining whether the packet header is tobe provided to the processing path or to be diverted to the bypass pathcomprises determining, based at least in part on the packet flow towhich the packet belongs, whether the packet header is to be provided tothe processing path or to be diverted to the bypass path.

Determining whether the packet header is to be provided to theprocessing path or to be diverted to the bypass path comprisesdetermining, based on one or more statistical attributes associated withthe packet, whether the packet header is to be provided to theprocessing path or to be diverted to the bypass path.

Determining whether the packet header is to be provided to theprocessing path or to be diverted to the bypass path comprisesdetermining, based at least in part on i) a congestion level of theprogrammable header alteration processor and ii) one or more congestionhanding attributes associated with the packet, whether the packet headeris to be provided to the processing path or to be diverted to the bypasspath.

The method further comprises, when it is determined that the packetheader is to be diverted to the bypass path, storing the packet headerin a unified buffer in parallel with the programmable header alterationprocessor, the unified buffer configured to temporarily store i) packetheaders of packets that bypass the programmable header alterationprocessor and ii) at least portions of packet headers that do not bypassthe programmable header alteration processor, wherein the portions ofthe packet headers are not needed for processing by the programmableheader alteration processor.

The method further comprises, when it is determined that the packetheader is to be provided to the processing path: extracting one or moreportions of the packet header to be provided to the programmable headeralteration processor, generating a header alteration processoraccessible header to include the one or more portions extracted from thepacket header, the header alteration processor accessible header beingseparate from the packet header, providing the header alterationaccessible header, rather than the packet header, to the programmableheader alteration processor, processing, with the programmable headeralteration processor, the header alteration accessible header, and afterprocessing the header alteration accessible header with the programmableheader alteration processor, integrating the processed header alterationaccessible header into the packet header.

The method further comprises generating, with the hardware inputprocessor of the programmable header alteration engine, metadata toinclude at least an indicator of a processing thread, stored in theprogram memory, to be implemented to process the packet header by theprogrammable header alteration processor, and providing, with thehardware input processor of the programmable header alteration engine,the metadata along with the alteration accessible header to theprogrammable header alteration processor.

Providing the metadata along with the alteration accessible header tothe programmable header alteration processor includes splittinginformation comprising the metadata and the alteration accessible headerinto a plurality of chunks, and serially transferring respective chunks,among the plurality of chunks, to the programmable header alterationprocessor, wherein an initial chunk of the plurality of chunkstransferred to the programmable header alteration processor includes atleast the indicator of the processing thread, stored in the programmemory, to be implemented to process the packet by the programmableheader alteration processor.

The method further comprises, prior to receiving an initial portion ofthe alteration accessible header at the programmable header alterationprocessor, retrieving, with the programmable header alterationprocessor, based on the indicator, of the processing thread, included inthe initial chunk of the plurality of chunks, a set of computer readableinstructions from the program memory to be used for processing thealteration accessible header.

The programmable header alteration processor includes a plurality ofprocessing nodes.

The method further comprises, when it is determined that the packetheader is to be provided to the processing path, providing at least aportion of the packet header to a processing node, among the pluralityof processing nodes, for processing of the packet header by theprocessing node, wherein processing of the packet header by theprocessing node is performed in parallel with processing of anotherpacket header, corresponding to another packet received by the networkdevice, by another processing node among the plurality of processingnodes.

Processing the packet header by the programmable header alterationprocessor includes executing, with the programmable header alterationprocessor, a set of computer readable instructions retrieved from thepacket memory, including, during execution of the set of computerreadable instructions, triggering one or more hardware acceleratorengines to perform one or more specific processing operations withrespect to the packet header.

Triggering a hardware accelerator engine among the one or more hardwareaccelerator engines includes issuing a native central processing unit(CPU) instruction of a processing node of the programmable headeralteration processor, and mapping, based on an address indicated in thenative CPU instruction, the native CPU instruction to an acceleratortrigger instruction to trigger the hardware accelerator engine.

In another embodiment, a network device comprises a packet processorconfigured to i) receive a packet from a network link and ii) determineat least one egress interface via which the packet is to be transmittedby the network device, and a programmable header alteration engineincluding i) a hardware input processor implemented in hardware and ii)a programmable header alteration processor coupled to a program memory,the programmable header alteration processor configured to executecomputer readable instructions stored in the program memory to performone or more header alteration operations on received packets. Thehardware input processor is configured to determine whether a packetheader of the packet is to be provided to a processing path coupled tothe programmable header alteration processor or to be diverted to abypass path that bypasses the programmable header alteration processor,and provide the packet header to the processing path or to the bypasspath based on the determination of whether the packet header is to beprovided to the processing path or to be diverted to the bypass path.The programmable header alteration processor is configured toselectively i) process the packet header when the packet header isprovided to the processing path and ii) not process the packet headerwhen the packet header is provided to the bypass path. The packetprocessor is further configured to cause the packet to be transmittedvia the at least one egress interface of the network device.

In other embodiments, the network device also comprises one of, or anysuitable combination of two or more of, the following features.

The packet processor is further configured to determine a packet flow towhich the packet belongs.

The hardware input processor is configured to determine, based at leastin part on the packet flow to which the packet belongs, whether thepacket header is to be provided to the processing path or to be divertedto the bypass path.

The hardware input processor is configured to determine, based on one ormore statistical attributes associated with the packet, whether thepacket header is to be provided to the processing path or to be divertedto the bypass path.

The hardware input processor is configured to determine, based at leastin part on i) a congestion level of the programmable header alterationprocessor and ii) one or more congestion handing attributes associatedwith the packet, whether the packet header is to be provided to theprocessing path or to be diverted to the bypass path.

The hardware input processor is further configured to, when it isdetermined that the packet header is to be provided to the bypass path,store the packet header in a unified buffer in parallel with theprogrammable header alteration processor, the unified buffer configuredto temporarily store i) packet headers of packets that bypass theprogrammable header alteration processor and ii) at least portions ofpacket headers that do not bypass the programmable header alterationprocessor, wherein the portions of the packet headers are not needed forprocessing by the header alteration processor.

The hardware input processor is further configured to, when it isdetermined that the packet header is to be provided to the processingpath: extract one or more portions of the packet header to be providedto the programmable header alteration processor, generate a headeralteration processor accessible header to include the one or moreportions extracted from the packet header, the header alterationprocessor accessible header being separate from the packet header, andprovide the header alteration accessible header, rather than the packetheader, to the header alteration processor.

The hardware input processor is further configured to generate metadatacorresponding to the packet, the metadata including at least anindicator of a processing thread, stored in the program memory, to beimplemented to process the packet header by the programmable headeralteration processor, and provide the metadata along with the alterationaccessible header to the header alteration processor.

The hardware input processor is configured to split informationcomprising the metadata and the alteration accessible header into aplurality of chunks, and serially transfer respective chunks, among theplurality of chunks, to the programmable header alteration processor,wherein an initial chunk of the plurality of chunks transferred to theprogrammable header alteration processor includes the indicator of theprocessing thread, stored in the packet memory, to be implemented toprocess the packet header by the programmable header alterationprocessor.

The programmable header alteration processor is configured to, prior toreceiving an initial portion of the alteration accessible header at theprogrammable header alteration processor, retrieve, from the programmemory based on the indicator, of the processing thread, included in theinitial chunk of the plurality of chunks, a set of computer readableinstructions to be used for processing the alteration accessible header.

The programmable header alteration processor includes a plurality ofprocessing nodes.

The hardware input processor is configured to, when it is determinedthat the packet header is to be provided to the processing path, provideat least a portion of the packet header to a processing node, among theplurality of processing nodes, for processing of the packet header bythe processing node, wherein processing of the packet header by theprocessing node is performed in parallel with processing of anotherpacket header, corresponding to another packet received by the networkdevice, by another processing node among the plurality of processingnodes.

The programmable header alteration processor is configured to execute aset of computer readable instructions, retrieved from the packet memory,to process the packet header, the programmable header alterationprocessor being configured to, during execution of the set of computerreadable instructions, trigger one or more hardware accelerator enginesto perform specific one or more processing operations with respect tothe packet header, wherein the header alteration processor is configuredto trigger an accelerator engine among the one or more acceleratorengines at least by issuing a native CPU instruction of a processingnode of the header alteration processor, wherein the native CPUinstruction is mapped, based on an address indicated in the native CPUinstruction, to an accelerator trigger instruction to trigger thehardware accelerator engine.

In yet another embodiment, a method for processing packets in a networkdevice includes: receiving, at a packet processor of the network device,a packet received by the network device from a network link;determining, with the packet processor, at least one egress interfacevia which the packet is to be transmitted by the network device;processing a packet header of the packet with a programmable headeralteration processor coupled to a program memory, the programmableheader alteration processor being configured to execute computerreadable instructions stored in the program memory to perform one ormore header alteration operations on received packets, the processingincluding triggering a hardware checksum accelerator engine to calculatea checksum for a bit string corresponding to at least a portion of thepacket header, wherein triggering the hardware checksum acceleratorengine includes i) partitioning the bit string into a plurality ofsegments of the bit string and ii) transferring the plurality of segmentof the bit string to the hardware checksum accelerator engine;incrementally calculating, with the hardware checksum accelerator, thechecksum at least by incrementally summing the respective segments,among the plurality of segments of the bit string, transferred to thehardware checksum accelerator engine; and transmitting, via the at leastone egress interface of the network device, the packet with a modifiedheader that includes the checksum.

In other embodiments, the method also includes one of, or any suitablecombination of two or more of, the following features.

Triggering the hardware checksum accelerator engine includes generating,with the programmable header alteration processor, respective triggerinstructions to transfer respective segments of the plurality ofsegments of the bit string to the hardware checksum accelerator engine,and serially providing the respective trigger instructions to thehardware checksum accelerator engine.

Triggering the hardware checksum accelerator engine includes issuing oneor more native central processing unit (CPU) instructions supported bythe programmable header alteration processor, and mapping, based on anaddress indicated in the one or more native CPU instructions, each ofthe one or more native CPU instructions to an accelerator triggerinstruction for triggering the hardware checksum accelerator engine.

Incrementally calculating the checksum includes incrementally summing,with the hardware checksum accelerator engine, the plurality of segmentsin multiple summation stages, including storing a result of a previoussummation stage in an accumulator, the result of the previous summationstage corresponding to a sum of two or more segments, among theplurality of segments, transferred to the hardware checksum acceleratorengine, and in a subsequent summation stage, summing the result of theprevious summation stage, stored in the accumulator, with a subsequentsegment, among the plurality of segments, transferred to the hardwarechecksum accelerator engine.

Incrementally calculating the checksum further includes wrapping one ormore carry bits generated in the previous summation stage back into theaccumulator in a same clock cycle in which the previous summation stageis performed by the hardware checksum accelerator engine.

The method further comprises determining, with the hardware checksumaccelerator engine based on information provided by the programmableheader alteration processor to the hardware checksum accelerator engine,whether or not the checksum is to be inverted, and in response todetermining that the checksum is to be inverted, inverting the checksumprior to outputting the checksum from the hardware checksum acceleratorengine.

Triggering the hardware checksum accelerator engine to calculate thechecksum for the bit string includes providing, to the hardware checksumaccelerator engine, an indication of a memory location in a memory towhich the checksum is to be written.

The method further comprises comprising causing, with the hardwarechecksum accelerator engine, the checksum to be written to the indicatedmemory location in the memory.

Triggering the hardware checksum accelerator engine to calculate thechecksum for the bit string comprises triggering the hardware checksumaccelerator engine to calculate the checksum for a bit stringcorresponding to one of i) an internet protocol (IP) header included inthe packet header, ii) a user datagram protocol (UDP) header included inthe packet header or iii) a generic UDP encapsulation (GUE) headerincluded in the packet header.

In still another embodiment, a network device comprises a packetprocessor configured to i) receive a packet from a network link, thepacket including a packet header and a payload and ii) determine atleast one egress interface via which the packet is to be transmitted bythe network device, and a programmable header alteration processorcoupled to a program memory, the programmable header alterationprocessor configured to execute computer readable instructions stored inthe program memory to perform one or more header alteration operationson packet headers of received packets, the programmable headeralteration processor being configured to, during processing of a packetheader of a received packet, trigger a hardware checksum acceleratorengine to calculate a checksum for a bit string corresponding to atleast a portion of the packet header. The programmable header alterationprocessor is configured to partition the bit string into a plurality ofsegments of the bit string, and transfer the plurality of segment of thebit string to the hardware checksum accelerator engine. The hardwarechecksum accelerator engine is configured to incrementally calculatingthe checksum at least by incrementally summing the respective segments,among the plurality of segments of the bit string, transferred to thehardware checksum accelerator engine. The packet processor is furtherconfigured cause the packet with a modified header that includes thechecksum to be transmitted via the at least one egress interface of thenetwork device.

In other embodiments, the network device also comprises one of, or anysuitable combination of two or more of, the following features.

The programmable header alteration processor is configured to generaterespective trigger instructions to transfer respective segments of theplurality of segments of the bit string to the hardware checksumaccelerator engine, and serially provide the respective triggerinstructions to the hardware checksum accelerator engine.

The programmable header alteration processor is configured to triggerthe hardware checksum accelerator engine at least by issuing one or morenative CPU instructions supported by the programmable header alterationprocessor, wherein each of the one or more native CPU instructions ismapped, based on an address indicated in the native CPU instruction, toan accelerator trigger instruction for triggering the hardware checksumaccelerator engine.

The hardware checksum accelerator engine is configured to incrementallycalculate the checksum at least by summing the plurality of segmentsover multiple summation stages.

The hardware checksum accelerator engine is configured to store a resultof a previous summation stage in an accumulator, the result of theprevious summation stage corresponding to a sum of two or more segments,among the plurality of segments, transferred to the hardware checksumaccelerator engine, and in a subsequent summation stage, sum the resultof the previous summation stage, stored in the accumulator, with asubsequent segment, among the plurality of segments, transferred to thehardware accelerator engine.

The hardware checksum accelerator engine is further configured to wrapone or more carry bits generated in the previous summation stage backinto the accumulator in a same clock cycle in which the previoussummation stage is performed by the hardware checksum acceleratorengine.

The hardware checksum accelerator engine is further configured todetermine, based on information provided by the programmable headeralteration processor to the hardware checksum accelerator engine,whether or not the checksum is to be inverted, and in response todetermining that the checksum is to be inverted, invert the checksumprior to outputting the checksum.

The programmable header alteration processor is further configured toprovide, to the hardware checksum accelerator engine, an indication of amemory location in a memory to which the checksum is to be written.

The hardware checksum accelerator engine is further configured to causethe checksum to be written to the indicated memory location in thememory.

The programmable header alteration processor is configured to triggerthe hardware checksum accelerator engine to calculate the checksum forone of i) an internet protocol (IP) header included in the packetheader, ii) a user datagram protocol (UDP) header included in the packetheader or iii) a generic UDP encapsulation (GUE) header included in thepacket header.

At least some of the various blocks, operations, and techniquesdescribed above may be implemented utilizing hardware, a processorexecuting firmware instructions, a processor executing softwareinstructions, or any combination thereof.

When implemented in hardware, the hardware may comprise one or more ofdiscrete components, an integrated circuit, an application-specificintegrated circuit (ASIC), a programmable logic device (PLD), etc.

When implemented utilizing a processor executing software or firmwareinstructions, the software or firmware instructions may be stored in anycomputer readable memory such as on a magnetic disk, an optical disk, orother storage medium, in a RAM or ROM or flash memory, processor, harddisk drive, optical disk drive, tape drive, etc. The software orfirmware instructions may include machine readable instructions that,when executed by one or more processors, cause the one or moreprocessors to perform various acts.

While the present invention has been described with reference tospecific examples, which are intended to be illustrative only and not tobe limiting of the invention, changes, additions and/or deletions may bemade to the disclosed embodiments without departing from the scope ofthe invention. For example, one or more portions of methods ortechniques described above may be performed in a different order (orconcurrently) and still achieve desirable results.

What is claimed is:
 1. A method for processing packets in a networkdevice, the method comprising: receiving, at a packet processor of thenetwork device, a packet received by the network device from a networklink; determining, with the packet processor, at least one egressinterface via which the packet is to be transmitted by the networkdevice; providing at least a packet header of the packet to aprogrammable header alteration engine of the packet processor, theprogrammable header alteration engine including i) a hardware inputprocessor implemented in hardware and ii) a programmable headeralteration processor coupled to a program memory, the programmableheader alteration processor being configured to execute computerreadable instructions stored in the program memory to perform one ormore header alteration operations on received packets; determining, withthe hardware input processor of the programmable header alterationengine, whether the packet header is to be provided to a processing pathcoupled to the programmable header alteration processor or to bediverted to a bypass path that bypasses the programmable headeralteration processor; providing, with the hardware input processor ofthe programmable header alteration engine, the packet header to theprocessing path or to the bypass path based on the determination ofwhether the packet header is to be provided to the processing path or tobe diverted to the bypass path; selectively i) processing the packetheader by the programmable header alteration processor when the packetheader is provided to the processing path and ii) not processing thepacket header by the programmable header alteration processor when thepacket header is provided to the bypass path; and transmitting, with thenetwork device, the packet via the at least one egress interface of thenetwork device.